Spatiotemporal correlations of handset-based service usages 



Hang-Hyun Jo,^'[^ Marton Karsai,^ Juuso Karikoski,^ and Kimmo Kaski-'^ 

^Department of Biomedical Engineering and Computational Science, 
Aalto University School of Science, P.O. Box 12200, Finland 
^Department of Communications and Networking, 
Aalto University School of Electrical Engineering, P.O. Box 13000, Finland 

(Dated: March 2, 2013) 

We study spatiotemporal correlations and temporal diversities of handset-based service usages 
by analyzing a dataset that includes detailed information about locations and service usages of 
124 users over 16 months. By constructing the spatiotemporal trajectories of the users we detect 
several meaningful places or contexts for each one of them and show how the context affects the 
service usage patterns. We find that temporal patterns of service usages are bound to the typical 
weekly cycles of humans, yet they show maximal activities at different times. We first discuss their 
temporal correlations and then investigate the time-ordering behavior of communication services 
like calls being followed by the non-communication services like applications. We also find that 
the behavioral overlap network based on the clustering of temporal patterns is comparable to the 
communication network of users. Our approach provides a useful framework for handset-based data 
analysis and helps us to understand the complexities of information and communications technology 
enabled human behavior. 



I. INTRODUCTION 

Understanding macroscopic socio-economic phenom- 
ena of a large number of individuals has been exten- 
sively studied by means of social, physical, and compu- 
tational sciences [THSJ. Recent access to large-scale dig- 
ital datasets on human dynamics and social interaction 
has enabled us to quantitatively investigate the struc- 
ture and dynamics of human communication networks. 
Indeed, researchers have studied various datasets, rang- 
ing from email and mobile phone communications to so- 
cial network services, e.g. Twitter and Facebook [4tillj. 
Mobile phones or handsets are now actively utilized to 
accurately measure or sense human behavior because the 
handsets equipped with a variety of sensors, including 
GPS and WiFi, are carried around by the users every- 
day and all day through. Highly resolved location data 
collected from handsets have been recently used to un- 
cover human mobility patterns P"^HSn] . The reliability of 
data collected from handsets, i.e. "behavioral" data, was 
tested in the serial studies conducted within the frame 
of MIT's Reality Mining project [T71 [HI [H]. It was 
shown that the behavioral data are at least compara- 
ble to self-report survey data in terms of friendship net- 
work and even capturing information that self-reports are 
missing |18j . 

The handset usage patterns are known to be diverse 
among users when measured by the number or dura- 
tion of the phone sessions and by the amount of data 
received, to name a few (52 Within the individual 
handset usage patterns, temporal inhomogeneities due 
to circadian and weekly cycles were also reported |10j . 
which are in close relation to the spatial inhomogeneities, 
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such as nighttime at home and daytime in office. There- 
fore, for conducting a comprehensive study, it is impor- 
tant to identify the context characterizing the situation 
of handset user, and then to understand how the con- 
text affects service usage patterns [23U27j . However, it 
is only very recently when the effect of context on the 
handset-based service usages was investigated. But so 
far the analysis has been conducted mostly at the aggre- 
gate level, while the temporal diversities of service usage 
among users have been ignored |27j . 

In this paper, we study spatiotemporal correlations of 
the service usage patterns of individual users by analyz- 
ing a handset-based dataset. This dataset was collected 
from 124 users' handsets for over 16 months as a part of 
the OtaSizzle project at Aalto University, Finland |28) . 
A software installed on handsets collected information 
about the handset's locations and usages of various ser- 
vices, including web domain visits, applications, emails, 
voice calls, and short message services, with the resolu- 
tion of seconds in time and mobile network base stations 
spatially. After constructing spatiotemporal trajectories 
of the users we identify several contexts that are meaning- 
ful to them by using the context detection method |26) . 
Other methods include, for example, places of interest or 
meaningful locations [33130] and eigenmode analysis [^T]- 
I33j . Then, we find correlations between the spatiotem- 
poral trajectories and the service usage patterns. We 
observe the similarity and diversity in temporal patterns 
of the service usages and discuss their temporal correla- 
tions, time-ordering behavior between services, and be- 
havioral overlap network based on the clustering results. 
Our approach provides a useful framework for handset- 
based data analysis, and hence it would be important for 
better design of information and communications tech- 
nology (ICT) enabled social environments and services. 

This paper is organized as follows. In Section|ll]we de- 
scribe the data collection and preparation methods. In 



2 




FIG. 1: Recording frequencies at mobile network base stations 
by all users, (a) Over the world, (b) in Finland, and (c) in 
the Helsinki municipal area. The higher frequency is denoted 
by the warmer color. In (c) the size of circle is logarithmic in 
frequency. 

Section [ni] several contexts for each user are identified by 
means of the context detection method applied to user's 
spatiotemporal trajectory. In Section [IV| we uncover the 
spatiotemporal correlations and the similarity and diver- 
sity in temporal patterns of the service usages. Finally, 
we summarize the results with concluding remarks in Sec- 
tion El 



II. HANDSET-BASED DATASET 
A. Data collection method 

The handset-based dataset in this study was collected 
by the MobiTrack software installed on Nokia Symbian 
smartphones of 183 participants or users from Septem- 
ber 2009 to December 2010, i.e. for a period spanning 
about 16 months. All users were students and staff mem- 
bers of Aalto University, Finland and identified as early 
adopters of mobile phones and services [31] . The dataset 
was anonymized so that no personal information of the 
users could be obtained. We consider only 124 users with 
the overall duration of handset usage longer than 30 days, 
see Section UlIl for details. 

The dataset consists of two kinds of information: lo- 
cations and service usages. The resolution of locations 
is limited to the physical area covered by each mobile 
network base station, i.e. cell, denoted by c. When- 
ever the handset is connected to a new cell or otherwise 
every half an hour, the identifier of the cell connected 
by the handset was recorded with a timestamp t with 
one second resolution. Each cell can be located in the 
geographical space with a unique pair of latitude and 
longitude. The geographic information for cells and the 
maps used in Figs.[l]and[6]were collected as a part of the 
OpenNetMap project and from open databases [35Vl37j . 
For all users we have 5596041 records at 99206 difl^er- 
ent cells. Although only 29.0% of cells could be located 
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FIG. 2: Original and filtered distributions of inter-event time 
for web domain visits by all users. The inter-event time is 
defined as the time interval between consecutive web domain 
visits by the same user. The peaks due to automatic events by 
the browser have been successfully suppressed after filtering. 

in the geographical space, they correspond to 91.3% of 
records. Figure [l] shows all located cells over the world, 
in Finland, and in the Helsinki municipal area. In this 
way, the detailed spatiotemporal trajectory of each user 
could be constructed in terms of a sequence of cell records 
{(cfc, tk)}, where k denotes the ordered index of record. 

For service usage data we consider five services: web 
domain visit (web), application (app), email, voice call 
(call), and short message service (SMS). Each service us- 
age or event was recorded with a timestamp with one 
second resolution together with service-specific relevant 
information. In the case of web domain visits, a URL 
(Uniform Resource Locator) was extracted and recorded 
whether it was visited via browser or widget. Only the 
applications visible in the foreground of the handset were 
recorded so that no process or application running in the 
background was considered. The records of communica- 
tion services, such as email, call, and SMS, include the in- 
formation on whether the user was an initiator or receiver 
of the communication event, and on the communication 
partner if available. For more information regarding the 
data collection method, see |34j . 



B. Data preparation method 

The service usage dataset contains events mostly gen- 
erated by users but it also contains automatic events by 
the operating system of the handsets. In order to ob- 
serve the pure human behavior, we systematically fil- 
tered out these automatic events. However, some spu- 
rious regularities still remain in the web dataset. In the 
cases of google.com, facebook.com and so on, once a web 
is connected, the browser might visit the same web au- 
tomatically for periodic updates and synchronization of 
accounts until the web is disconnected. To resolve this 
issue, we obtain the distribution of inter-event time r. 
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FIG. 3: Schematic diagram for deriving temporal boundaries 

{(cfe,4''.4'')} (colored boxes) from a sequence of cell records 
{(cfe,ifc)} (vertical black lines). It is assumed that the user 
stays in the cell Ck from the moment of t^!'' to t^f^ See the 
text for details. 

defined as the time interval between consecutive web do- 
main visits by the same user. Several sharp peaks at 
specific inter-event times are found, where each peak is 
mostly related to the single webpage. We remove all the 
events leading to those inter-event times, except for the 
event trains consisting of only two events with t — 10 sec- 
onds. It is because some trains with only two events sep- 
arated by 10 seconds can also be generated by users. As 
new regularities become visible after filtering, we apply 
this method recursively until the peaks are suppressed 
considerably, leading to an approximately 25% of entire 
events removed. Figure[2]shows that this filtering method 
for web dataset does not change the overall characteris- 
tics of the inter-event time distribution. 

We also ignore some user-generated application events 
associated with other service usages, corresponding to 
17% of entire events. For example, the user opens the 
messaging application when sending or receiving SMSs. 
These application events might lead to artificial correla- 
tions between different service usages. In addition, cor- 
rupted events, less than 0.1% of the whole dataset, have 
been ignored or manually corrected. Finally, we have 
792971 web domain visits, 433726 application events, 
17976 emails, 79779 calls, and 79283 SMSs in the ser- 
vice usage dataset. 



III. CONTEXT DETECTION FROM 
SPATIOTEMPORAL PATTERN 

In order to detect the contexts for each user, we con- 
struct the user's spatiotemporal trajectory from a se- 
quence of cell records {{ck,tk)}- It is necessary to in- 
fer the user's location between consecutive timestamps 
of cell records. From a sequence of cell records, we derive 
the temporal boundaries {(cfe, for the user's 
trajectory, implying that the user stays within the area 
covered by cell from the moment of t^j^'' to t^^\ see 
Fig. [3l It is assumed that the user stays in the cell Ck 
till tj^ — ^{tk + tk+i) and then in the cell c^.+i from 

t'"i^]_i = when tk+i — tk < 2tc- Here we set tc as half 
an hour, i.e. the time interval for regular cell recording. 
The time interval between consecutive timestamps longer 
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FIG. 4: Locations and service usage patterns of a sample user 
81 during typical Friday and Saturday. The first and second 
rows represent cells and contexts assigned to cells. Home, 
Office, Other meaningful place, and Elsewhere are denoted 
by red, blue, green, and gray colors, respectively. Different 
depths of the same color indicate the different cells belonging 
to the same context. Service usage events are denoted by 
vertical lines in the rows of web, app, email, call, and SMS 
(from the third to the bottom). 

than 2tc implies that the handset may be turned off, used 
in offline or airplane mode, or not able to detect any cell 
nearby. If tk+i ~ tk > 2tc, the user is considered to stay 
in the cell Ck till t^j^^ — tk + tc and in the cell Ck+i from 
tj^j^i = tk+i—tc- Hence, the location is unknown between 

t''^^ and t^k+i- Then, the total time spent, i.e. duration, 
in each cell c is obtained as follows: 

{fc|Cfc=c} 

If the sum of durations in all the recorded cells, D = 
^^dc, is less than 30 days, that user is not considered 
for the further analysis, leading to 124 available users. 
The average and standard deviation of D for available 
users are 121 ± 63 days. 

In addition, we observe back and forth changes in a 
short time span between two cells covering the neighbor- 
ing areas. It can occur even without any real movement 
of the handset if the handset is located at the bound- 
ary of two neighboring cells. To filter out this noisy 
behavior, the involved cells can be clustered by a sand- 
wich clustering method . Here we consider only one 
type of sandwich with four records involving two cells, 

i.e. Ck = Cfc+2 ^ Ck+i = Ck+3 with - ^ < t^ for 
I = k, - ■ ■ , fc -I- 3. Whenever this type of sandwich is de- 
tected, every Ck in the temporal boundaries is replaced 
by or merged into Ck+i if c^cfc+i > dc^, and vice versa. 
Consequently, some geographically neighboring cells can 
be clustered into one representative cell, which from now 
on will be considered equally with normal cells. For ex- 
ample, the first row in Fig.|4]shows the user Si's tempo- 
ral boundaries during typical Friday and Saturday. Note 
that clustering cells for one user is independent of other 
users' records. 

We find spatiotemporal inhomogeneities of the trajec- 
tories of handsets on the individual basis as well as at 
the aggregate level. As an illustrative example, we ob- 
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FIG. 5: Rank curves of cells for all users and for sample 
users 5 and 81. Here the rank curve is defined as an ordered 
sequence of durations in cells. 




FIG. 6: Contexts detected for sample users 5 and 81 in a 
map of Helsinki municipal area. Each cell is represented by 
the circle with radius according to the duration in that cell. 
The cells are identified as either Home (red), Office (blue), 
Other meaningful place (green), or Elsewhere (gray). 



tain the rank curve d{r), defined as the duration in the 
rth cell c in a descending order according to dc- The rank 
curve for all users is highly skewed, such that the first few 
cells, including one in Otaniemi campus of Aalto Univer- 
sity, were visited for more than a few months while 88.9% 
of cells were visited for less than one hour, as shown in 
Fig. [5] The same inhomogeneities are also observed for 
individual users. For example, the rank curves for users 
5 and 81 are shown in Fig. [5] who were selected to show 
the representative behavior. 

The heavily visited cells are supposed to cover mean- 
ingful places to the handset user, such as home and office. 
Since the service usage patterns might be affected by the 
different characteristics of meaningful places, it is impor- 
tant to identify the context characterizing the situation 
of user. Here the context is preferred to the meaningful 
place because the time and place of handset usage are not 
independent but correlated, e.g. nighttime at home and 
daytime in office [IS]. Each cell will be detected as one 
of five contexts, such as Home, Office, Other meaningful 



place (Other), Elsewhere (Else), and Abroad. One con- 
text can be assigned to several cells. The identifier of a 
cell contains the mobile country code (MCC), by which 
Abroad context is assigned to the cells out of Finland. 
For the cells within Finland, we obtain more detailed 
durations for each cell c: 

1. duration on weekdays ((ic,wd), 

2. duration on weekdays between AM and 6 AM 
(4,0-6), and 

3. duration on weekdays between 10 AM and 4 PM 
(rfc.io-ie)- 

Now we describe criteria for assigning contexts except 
for Abroad. A cell is detected as Elsewhere (Else) if the 
duration in that cell is negligible to the total duration as 

dc/f < iclsowhcrc = 0.02. (2) 

For example, Else is assigned to the cells along the high- 
ways. The threshold value of ^elsewhere has been deter- 
mined in order to leave only 0.2% of cells, i.e. 3.73 cells 
per user, for other contexts. A cell is detected as Office 
if the user spends a considerable time in that cell during 
the working time on weekdays as 

dc,wd/dc > ^weekday = 0.8 (3) 

and 

dc,10-ie/dc,wd > ^worktimc = 0.5. (4) 

With above threshold values, at least one Office has been 
detected for more than half of the users. Note that most 
users were students so that they might not have any reg- 
ular places to visit during the working time. Next, Home 
is assigned to a cell if the user spends a considerable time 
in that cell for nighttime and free time, i.e. the remaining 
time except for the working time, on weekdays as 

dc.O—G / dc.wd ^nighttime 

0.1 (5) 

and 

'^c,10-16/'^c,wd < ^ficctimc = 0.3. (6) 

With above threshold values, at least one Home has been 
detected for all users except for two of them. Many users 
turn out to have more than one Home, such as user's own 
home and his/her parent's home. Finally, the remain- 
ing cells are detected as Other meaningful place (Other). 
Figure |6] shows the locations of detected contexts for sam- 
ple users in the Helsinki municipal area. We put two 
sample users' contexts together to avoid privacy issues. 

Our context detection method is validated by weekly 
patterns of duration for different contexts obtained for 
sample users and at the aggregate level, as depicted in 
Fig. [7| For example, the user 5 without Other detected 
shows a very regular pattern, especially on weekdays, 
i.e. at Home in nighttime, in Office during the working 
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ties of usages. The fraction of service usage is defined as 
follows 



FIG. 7; Weekly patterns of duration in hours for the different 
contexts for users 5 and 81 and for all users (from top to 
bottom). The typical weekly cycles of humans are observed. 



time, and at Else when moving between Home and Office. 
Weekly patterns of user 81 are comparable to the tempo- 
ral boundaries in terms of detected contexts, as depicted 
in the second row in Fig. |4] Weekly patterns of dura- 
tion aggregated over all users show the overall behavior. 
Durations at Home, Office, Other, and Else account for 
66.8%, 7.0%, 8.5%, and 14.0% of the total duration of aU 
users, respectively. 



IV. SPATIOTEMPORAL CORRELATIONS OF 
SERVICE USAGES 

We investigate correlations between users' spatiotem- 
poral trajectories and their service usage patterns. Here 
five services, such as web domain visit (web), application 
(app), email, voice call (call), and short message service 
(SMS), are considered and each service is denoted by s. 
The spatiotemporal correlation of service usages for user 
i is fully characterized by the number of events corre- 
sponding to the service s in the cell c and at time i, de- 
noted by nis{c,t). For gaining contextual understanding 
of correlations we consider the contexts instead of cells, 
i.e. riigiC^t) = '^»s(c, i), where the summation is over 
c detected as context C. 



A. Contextual correlations of service usages 

We first focus on the contextual correlations of ser- 
vice usages with nis{C) — ^^■nis{C,t). Since services 
have qualitatively different characteristics, the numbers 
of events of different services cannot be directly compared 
to each other but only in terms of fractions and intensi- 
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Figure [s] (left) shows the fractions for sample users 5 and 
81 as well as their means over all users with standard 
errors, measured by the bootstrap method. The handset 
of user 5 has never been abroad and no Other context is 
detected. For this user all service usages are more active 
at Home and Office than at Else, which is very differ- 
ent from the service usage patterns of user 81. Due to 
the diversity of the service usage patterns among users, 
any general conclusion cannot be made on the individ- 
ual basis. However, by looking at the means with stan- 
dard errors, it is found that all service usages are the 
most active at Home, while they are relatively inactive 
for other contexts. Given the aggregate durations for dif- 



ferent contexts obtained in the Section HI this finding 
can be explained such that the longer duration for some 
context means the higher chance for service usage. 

Accordingly, instead of the fractions of service usages 
we consider those divided by the corresponding durations 
as follows: 
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where dic denotes the duration of user i for context C . 
The results are shown in Fig. [s] (right). Despite of the 
diversity among users, the means of intensities of differ- 
ent services for the same context have to some extent 
similar values. The large mean of intensity of email us- 
age in Office might be due to the fact that users prefer 
emails to calls or SMSs in classes or laboratories during 
the working time. The large mean of intensity of web 
usage at Else could be the result of users killing time by 
surfing the webpages while on the move. One could also 
say that users while abroad tend to use SMSs more than 
other communication services. Finally, for all services, 
only the means of intensity at Home turn out to be less 
than 1 and most inactive, which could be partly because 
users have many other activities to do at Home. 



B. Temporal correlations and time-ordering of 
service usages 

We turn to analyze the temporal correlations of service 
usages in terms of nis{t) — n.is(C, i), where the sum- 
mation is over all contexts with one exception, Abroad. 
It is because the service usage abroad cannot be con- 
sidered as normal, as shown in Fig. [8j We first obtain 
weekday and weekend patterns of service usages as 



^ wd 



(i) - ^n,,(i-|-fcrd), 
fc 



(9) 
(10) 



6 



0.8 
0.6 
0.4 
0.2 


0.8 
0.6 
0.4 
0.2 


0.8 
0.6 
0.4 
0.2 




Home Office 



user 5: web 
app 
email 
call 
SMS 



Home Office Otfier 



Else 



Abroad 



J J y J. ii 



user 81 : web 
app 
email 
call 
SMS 



L_L 



Otfier 



Else 



Abroad 



all users: web 
app 
email 
call 
SMS 



lii iii iili 



Home Office 



Other 



Else 



Abroad 



4 r- 

3 - 

2 - 

1 - 

- 



Home 



Office 



Otfier 



SMS 



user 5: web 
app 
email 

i 

Else 



Abroad 



.K^ II 


ll .11 


V 

1 1 


ser 81 : web 
app 
email 
call 

SMS - 

III b 


Home Office Otfier 


Else 


Abroad 


Jl^l 1 


II III 


all users: web 
app 
email 
1 call 

II 111] 



Home 



Office 



Other 



Else 



Abroad 



FIG. 8: Contextual correlations of service usages for users 5 and 81 and for all users (from top to bottom). Fractions (left) 
and intensities (right) of service usages are defined in Eqs. ([7| and ([8|, respectively. Standard errors are also provided for the 
user-averaged statistics. 



for < t < with Tj^ = 1 day. Here k and k' denote 
the indexes of weekdays and weekends, respectively. The 
weekday and weekend event rates of service s for user i 
are defined as 



(11) 



(12) 



where a = 1/5 and a' = 1/2 are weights for normaliza- 
tion. In addition we obtain the weekday and weekend 
event rates averaged over all users. 

In Fig. |9] we show the individual event rates for sam- 
ple users 5 and 81 as well as the event rates averaged 
over all users. The overall behavior of the individual and 
user-averaged event rates reflects typical weekly cycles 
of humans by being more active in the daytime and on 
weekdays and less active in the nighttime and on week- 
ends. From the user-averaged event rates, we find that 
email (call) is more used around noon (late afternoon) 
on weekdays, while email (call) is less (more) used than 
other services in the weekend daytime. Since most users 
in our dataset were students and staff members of the 
university, they might not be making or receiving calls in 
classes or laboratories in the weekday daytime. Instead 
they might be using other communication services, such 
as email and SMS. On the other hand, users might be 
using call more than email outside class or laboratory on 
weekends. 

To investigate the temporal correlations between ser- 
vice usages for each user, we calculate the Pearson corre- 
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FIG. 9: Weekday and weekend service usage patterns for users 
5 and 81 and for all users (from top to bottom), showing the 
similarity and diversity among users. The bin size was set to 
one hour. 



lation coefficient (FCC) by using the event rates of ser- 
vices s and s' for user i: 



T,Ms{t) - Pts][p2s'it) ~ Pis'] 



, (13) 



where pis = T^^^ 'l2tPis{t)- For the FCC on weekdays 
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FIG. 11: (a) Distributions of time interval At^g/ between con- 
secutive events of different services s and s' . (b) Diagram for 
time-ordering behavior between services based on the distri- 
butions of time interval. 



FIG. 10: Pearson correlation coefficients among service usages 
for users 5 and 81 and for all users (from top to bottom), 
obtained from weekday (left) and weekend (right) event rates. 
Positive and negative correlations are represented by orange 
and gray lines, respectively. 



and on weekends, pfg'^it) and /3™(t) are used, respec- 
tively. The values of PCC turn out in most cases to be 
positive (not shown here). This is mainly due to the typ- 
ical weekly cycles of humans as mentioned before. To 
correct such cycles, for each case of weekdays and week- 
ends we consider de-seasoned event rates defined as 



Apis (t) ^ p,s (^) - ^ X! 



(14) 



where Si denotes the number of services the user i have 
used. 

As shown in Fig.jlO] the values of PCC obtained for the 
de-seasoned event rates show similar and distinct behav- 
ior among users as well as between weekdays and week- 
ends. For example, in the case of user 5, the strongly pos- 
itive correlation between call and SMS usages on week- 
days turns to be slightly negative on weekends. This re- 
sult is consistent with the temporal patterns depicted in 
Fig. |9] The positive (negative) correlation between ser- 
vices by being used at the same time (at different times) 



of the week can be interpreted such that those services 
are complementary (substitutive) with each other |38) . 
Then, we obtain and compare distributions of PCC over 
all users for each pair of services. The mean values for 
web-app and call-SMS pairs (app-email pair) are slightly 
positive (negative) on weekdays and become slightly neg- 
ative (positive) on weekends. All other pairs have the 
negative mean values. The result for positive correla- 
tions is inconclusive due to the large standard errors of 
PCC up to 0.05. However, for the pairs of services with 
large negative correlations, such as web-call and web- 
SMS pairs, we can argue that those services might be 
used in a substitutive way. In order to compare the cor- 
relations for weekdays and for weekends, we have con- 
ducted the Kolmogorov-Smirnov test. It is found that 
the distributions of PCC for weekdays and for weekends 
are significantly different for the pairs of web-app {p- 
value less than 0.005), app-email (0.03), email-call (0.03), 
email-SMS (0.03), and call-SMS (0.02). This list of pairs 
contains all the pairs whose sign of the mean has changed 
from weekdays to weekends. 

For more detailed, i.e. event-based analysis of corre- 
lations among service usages, we obtain the distribution 
of time interval between two consecutive or simultaneous 
events but of different services of the same user. Pre- 
cisely, the time interval for a pair of services s and s' 
is defined by At^s/ — tgi — tg with event timings ts and 
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weekday 24h,0h weekend 24h,0h weekday 24h,0h weekend 24h 



FIG. 12: fc-means clustering results of users' weekly patterns. 
We have used fc = 10 and plotted only a few dominant clusters 
with cluster size in the parenthesis, see Table|l]for details. The 
bin size was set to one hour. 



t'j,. As shown in the upper panels of Fig. 11 (a), distri- 
butions for some service pairs have a peak at the nega- 
tive value of Atss' both for weekdays and for weekends. 
This indicates that the event of service s follows that of 
service s' . On the other hand, distributions for other 
pairs of services do not show any distinct peaks, imply- 
ing no temporal correlation. This time-ordering behav- 
ior could mean that one service usage might effectively 
induce another service usage. However, we cannot inves- 
tigate such a process by our dataset. We summarize the 
results such that communication services, such as email, 
call and SMS, are followed by non-communication ser- 

(b). 



11 



We 



vices, i.e. web and app, as depicted in Fig. 
also obtain the distributions of time interval for different 
contexts. We find the overall similar time-ordering be- 
havior (not shown here), except that email is followed by 
web at Home and that app does not follow communica- 
tion services abroad. Note that the event-based analysis 
cannot be directly compared to the analysis of aggregated 
weekly patterns. 



C. Clustering and overlaps in temporal patterns of 
service usage 

As it turns out, the temporal patterns of service usage 
are diverse from one user to another, while some of them 
still show similar behavior. To investigate the similarity 
and diversity of weekly patterns for each service we apply 
the /c-means clustering method [33] to the weekly event 
rates as p,s{t) = {p7f{t),pYf{t)}. To correct the typical 
weekly cycles of each service (not of each user), we use 
the de-seasoned event rates as follows 



Apis(t) = Pis{t) 



(15) 



where Ng denotes the number of users showing any activ- 
ity in service s. We similarly define the service-averaged 
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FIG. 13: Pearson correlation matrices of users' de-seasoned 
event rates. These matrices support the validity of the fc- 
means clustering results in Table |l] The user index has been 
sorted according to the corresponding cluster index and blank 
spaces are due to totally inactive users. 



event rates for each user for the clustering, to be denoted 
by avg. In each case we set the number of clusters as 
fc = 10 and the cluster index is denoted by g = 0, • • • ,9. 
Clustering has been conducted 2000 times with different 
initial conditions and here we present the result maximiz- 
ing the quality of clustering or validity index, defined as 
the minimum inter-cluster distance divided by the sum 
of intra-cluster distances |39] . 

The clustering results are summarized in Table |l] and 
only a few weekly patterns of dominant clusters are 
shown in Fig. |12| Only one dominant cluster is found 
in each case of web and email usages, implying similar 
patterns among users. Weekly patterns of app, call, and 



TABLE 1: fc-means clustering results for weekly patterns of 
service usages with fc = 10. g and A^s denote the cluster index 
and the number of available users for service s, respectively. 
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FIG. 14: Overlap network constructed based on the clustering 
results for all services. Circle, square, and hexagonal nodes 
represent female, male, and unknown gender of users, respec- 
tively. Each black solid thick line denotes a Hnk between users 
who belong to the same clusters for all services. Other colored 
lines denote the links between users who belong to the same 
clusters for all but one service: web (dashed thick blue), app 
(dotted thin red), email (dotted thick green), call (solid thick 
cyan), SMS (dashed thin violet), or due to the unused service 
by either user (solid thin gray). This figure was generated 
using Cytoscape v2.8.1 [40] . 



SMS usages are clustered into more than one dominant 
cluster. Compared to the largest cluster {q = 0) of call 
usage, the second largest cluster {q = 1) can be charac- 
terized by larger activities in the weekday daytime and in 
the weekend morning. The behavioral difference between 
dominant clusters in SMS usage is also obvious. The 
largest cluster {q = 0) represents the evening-type users, 
while the second largest cluster (g = 1) does the morning- 
type users on weekdays. In the case of service-averaged 
usage patterns, the second largest cluster {q = 1) shows 
the larger (smaller) activity in the daytime on weekdays 
(on weekends) than the largest cluster {q — 0). To check 
the validity of clustering results, we obtain the Pearson 
correlation matrices using the de-seasoned event rates, 
Apis{t). All the matrices support the fc- means cluster- 
ing results, see Fig. [13] We also tested the effect of the 
number of means. A:, on the clustering and found that the 
results are qualitatively similar apart from the number of 
small or outlying clusters. 

Finally, in order to get insight into the overall struc- 
ture of temporal correlations among users and services, 
we construct an overlap network based on the cluster- 
ing results. This leads to the network of overlapping 
communities | 41j . where nodes and link weights of the 
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FIG. 15: Topological overlap as a function of behavioral over- 
lap. The overall positive correlation between two overlaps is 
observed. 



network represent users and their overlaps, respectively. 
Precisely, the behavioral overlap is defined as the number 
of services in which two users, say i and j, belong to the 
same cluster as 



Of 



(16) 



Here qis denotes a cluster index for user z's service s, 
and the Kronecker delta function 6{q, q') gives l\iq = q' 
and otherwise. Figure [14] shows the overlap network 
with 436 links of = 4 and 5. The behavioral overlap 
= 5 of a link, denoted by thick black line, implies that 
the neighboring users belong to the same clusters for all 
services, i.e. they are fully synchronized. We find cliques 
consisting of only the fully synchronized users, which we 
call synchronized cores. The largest synchronized core 
with 9 users is closely related to the second largest syn- 
chronized core except for belonging to different clusters 
of call usage. These cores are also connected to many 
other users but not as a synchronized core. This agglom- 
erate structure can be induced by the relatively homo- 
geneous demographics of users in our dataset. However, 
we like to note that the clustering was applied to the de- 
seasoned event rates, which have been subtracted by the 
user-averaged temporal behavior. 

We compare the behavioral overlap network based on 
the clustering results to the communication network of 
users. The communication network can be constructed 
from the call and SMS datasets containing the informa- 
tion on communication partners. Only 67 out of 124 
users and 205 links between users are identified. The 
topological overlap of a link ij is defined as [6] 



|A,;UAj|-2' 



(17) 



where A^ denotes the set of neighbors of node i. Ojj has 
a value of 1 if i and j have exactly the same neighbors 
except for themselves and it has a value of if they do 
not have any neighbors in common. Figure [15] shows the 
overall positive correlation between behavioral and topo- 
logical overlaps. It implies that connected users sharing 
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more common neighbors show more similar weekly pat- 
terns of service usages. Thus, the behavioral overlap net- 
work based on the service usages can be used to reveal 
the communication network structure of users. 



V. SUMMARY 

We have investigated spatiotemporal correlations and 
temporal diversities of service usages by analyzing a 
handset-based dataset collected from 124 users for over 
16 months. The dataset consists of locations and service 
usages. After constructing the precise spatiotemporal 
trajectory for each user based on the location dataset, we 
identify several meaningful places or contexts by means 
of context detection method. As contexts, Home, Of- 
fice, Other meaningful place, Elsewhere, and Abroad are 
considered. We showed how the context affects the ser- 
vice usage patterns of users, including their web domain 
visit (web), application (app), email, voice call (call), and 
short message service (SMS). 

In this study we have found the similarity and diver- 
sity of weekly patterns among users and services, in terms 
of temporal correlations, time-ordering behavior between 
services, and overlap network based on clustering. The 
services used at the same time (at diff'erent times) of the 
week lead to the positive (negative) correlations between 
them, which can be interpreted as being complementary 
(substitutive) to each other. By conducting the event- 
based analysis instead of weekly patterns we observe the 
time-ordering behavior between services, such that com- 
munication services, i.e. email, call, and SMS, are fol- 
lowed by the non-communication services, i.e. web and 
app. Finally, the similarity and diversity of weekly pat- 
terns of service usages enable us to classify users into 



several different clusters, e.g. as characterized by the 
morning-type or evening-type usage patterns, except for 
the web and email usages. The behavioral overlap net- 
work constructed based on the clustering results can be 
used to reveal the communication or real social network 
structure of users. 

Our findings on the spatiotemporal correlations of ser- 
vice usage patterns for different contexts enable us to 
better understand the behavior of humans and what that 
implies. This is also important for better design of infor- 
mation and communications technology (ICT) enabled 
social environments and services. However, more de- 
tailed analysis with higher resolution is required to reveal 
the underlying mechanism or the origin of spatiotemporal 
correlations. 
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